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Abstract 

Bayesian optimal design is considered for experiments where it is hypothesised that the responses 
are described by the intractable solution to a system of non-linear ordinary differential equations 
(ODEs). Bayesian optimal design is based on the minimisation of an expected loss function where the 
expectation is with respect to all unknown quantities (responses and parameters). This expectation 
is typically intractable even for simple models before even considering the intractability of the ODE 
solution. New methodology is developed for this problem that involves minimising a smoothed 
stochastic approximation to the expected loss and using a state-of-the-art stochastic solution to the 
ODEs, by treating the ODE solution as an unknown quantity. The methodology is demonstrated 
on three illustrative examples and a real application involving estimating the properties of human 
placentas. 

Keywords: Approximate coordinate exchange algorithm; Bayesian optimal design; Ordinary differ¬ 
ential equations. 


1 Introduction 

1.1 Modelling complex physical processes 

Often the dynamics behind a complex physical process can be approximately described by a system 
of non-linear ordinary differential equations (ODEs), where the solution to these equations provides 
a model predicting how a quantity of interest will behave with respect to time. It is assumed that 
the system of ODEs depends on some unknown physical properties (parameters) of the process in 
question and, potentially, may also depend on some additional controllable (design) factors. 

The value of the parameters may be of direct interest or we may be interested in predicting the be¬ 
haviour of the process at certain values of the design variables and/or at a certain time. In either case, 
we need to estimate the unknown parameters. An experiment can be conducted where observations 
of the quantity of interest are collected at various different times and, possibly from multiple runs of 
the experiment with different values of the design factors. To estimate the parameters, a statistical 
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Table 1: Physical parameters, 6 for the human placenta example described in Section 1.2 


Symbol Description 


01 

02 

03 

04 


Maximum uptake 

Proportion of reaction occurring through active transport 
1st reaction rate 
2nd reaction rate 


model is assumed that links the parameters to the observations via a data-generating process based 
on the solution to the ODEs (see, for example, Ramsay et ah 2007). 

It may be possible to conduct a designed experiment where the observation times and design variables 
are actively selected in advance of the experiment. Optimal design of experiments refers to this 
selection being made optimally to minimise a loss function which reflects the ultimate goal of the 
experiment, e.g. estimation of the unknown parameters or prediction of a future process. 

We consider Bayesian optimal design of experiments for ODE models. Bayesian optimal design has 
very principled foundations but can be hard to implement in practice due to the computational 
complexities involved. Firstly, it involves the minimisation of the expected loss function which will 
typically be analytically intractable. Secondly, the dimensionality of the domain of the expected 
loss function can sometimes be large since every quantity that can be specified for the experiment 
corresponds to a dimension. 


Ramsay et ah 2007 


For ODE models, these problems are compounded by the fact that, typically, the solution to the 
system of ODEs is not analytically tractable. A possible approach is to use numerical methods to 
hnd an approximate solution to the systems of ODEs (see, e.g., Iserles, 2009). However, this has two 


disadvantages. First, the numerical solution can be computationally expensive. Bayesian inference 
for computationally expensive models has begun to receive considerable attention in the Statistics 


literatnre (e.g. 

Kennedy and 0 ’Hagan 

2001 

Rasmussen 2003 Bliznyuk et al. 
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Fielding et al. 

2011 

Overstall and Woods 

2013 

). In each case, to some degree, evaluation of the likelihood (which 


depends on the computationally expensive numerical solution) is replaced by an evaluation of an 
approximation. The second disadvantage, which is perhaps more serious, is that the numerical error, 
unavoidable with numerical methods, is typically not taken account of when performing subsequent 
evaluations with the numerical solution. This issue was explained by Chkrebtii et al. (2015) who 


proposed a fully probabilistic solution to the system of ODEs. We take advantage of this methodology 
in our treatment of optimal design for ODE models. 


Overstall and Woods (2015) proposed the approximate coordinate exchange (ACE) algorithm for 
Bayesian optimal design for non-ODE models. Very briefly, the ACE algorithm uses a cyclic descent 
algorithm (see, for example, Lange, 2013, pg 171) to minimise an approximation to the expected 


loss function. The approximation used is a Gaussian process (GP) emulator htted to a Monte Garlo 
integration approximation of the expected loss. In this paper, we extend this algorithm to ODE 
models. We replace evaluation of the intractable solution to the ODEs by a value generated from 


the probabilistic solution as proposed by Ghkrebtii et al. (2015) 


2 
























































1.2 Measuring human placentas 


To aid exposition of the ideas and methodology introdnced thronghont this paper, consider the 
following application from biologists at the Sonthampton Centre for Biological Sciences (University 
of Sonthampton, UK). Interest lies in the transport of the amino acid serine within a human placenta. 
Specihcally, we are concerned by how serine moves from the outside to the inside of a portion of 
placental cell membrane (called a vesicle). To investigate, an experiment is to be performed where 
initial amounts of radioactive and non-radioactive serine (in pi) are placed inside and outside the 
vesicle. The amount of radioactive serine inside the vesicle is then measured at a series of observation 
times. The serine transport process can be described by a system of ODEs which depend on the 
initial amounts of serine and four unknown physical parameters (see Table which are of interest 
to the scientists. The solution to the system of ODEs provides theoretically predicted amounts of 
radioactive and non-radioactive serine inside the vesicle at a certain time. The practitioners have 
control over the initial amounts of non-radioactive serine inside and outside the vesicle for each 
experiment and the values of the observation times. Our task is to choose the initial amounts and 
observation times optimally with respect to the goal of estimating the physical parameters. 


1.3 Organisation of the paper 


The paper is organised as follows. In Section we describe the background to the problem including 
statistical inference for ODE models, the premise of Bayesian optimal design and a brief description of 
the ACE algorithm. In Section]^ we describe the proposed methodology for optimal design for ODE 
models including a description of the probabilistic solution to ODEs of Chkrebtii et ah (2015) and 
how this can be embedded in the ACE algorithm. In Section we apply this methodology to three 
illustrative examples where the goal is parameter estimation. Finally, in Section]^ the methodology 
is applied to the human placenta example where differing goals of parameter estimation and model 
selection are considered. 


2 Background 

2.1 Statistical inference for ordinary differential equations 


Let X G df be a vector of k design variables, i.e. a treatment, and let 0 G 0 be a p x 1 vector of 
physical parameters. Consider the following system of S ODEs which dehne an initial value problem 


u{t) 

u(To) 


f (n(t),f,0,x) 

uo 


for t G [To,Ti] 


( 1 ) 


where u(t) is the gradient vector of u(t) with respect to time t, and Uq G denotes the initial 
conditions. In ([^, f : x [To,Ti] x A —)• is a suitably well-behaved function that, at the very 


least, we assume satishes the Lipschitz condition (see, e.g. Iserles, 2009, pg 3). This means that ([^ 


has a unique solution. Note that the solution actually depends on 6 and x, i.e. u(f) = u(f; 6, x) but 
we only use the longer notation when we need to be clear that there may be more than one 0 or x. 


Now we believe that the physical process in question is governed by ([^. For j = 1,... ,M, we 
observe the process for treatment Xj, initial conditions Uoj, and at times tj = (tji,... For 
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/ = 1,..., rij, we observe the c x 1 vector of responses yji G yji, where yji denotes the c-dimensional 
sample space for the jZth observation. Let = (yji,..., y^nj) be the crij x 1 vector of responses for 
the jth treatment and let y = (yi,... ^Ym) G 3^ be the n x 1 vector of responses for the complete 
experiment where n = rij and y = U^i overall sample space. We assnme that 

y are realisations according to 

y|V>,d ~ F(V>;d) , (2) 

where F is a known probability distribntion, i/^Gd'isaPxl vector of model parameters (with 
parameter space d'), and dGPisagxl vector specifying the design (with design space V). The 
distribntion in ([^ essentially defines a statistical model. Note that we decompose t/? = (0,7), where 
7 G F is a (P — p) X 1 vector of nnisance parameters. The design, d, is the set of controllable exper¬ 
imental conditions and can inclnde the treatments xi,... ,xm; the initial conditions Uqi, ... ,Uom; 
and the observation times ..., tjnj, ior i = j,, M. In practice, some of these may be hxed by 
the protocol of the experiment. Alternatively, the initial conditions may be nnknown, and inclnded 
in the vector of parameters, i/’i either as physical or nnisance parameters. 

The dependence of the distribntion in (|^ on 0 and d is throngh the solntion of the system of ODEs 
given by ([^. The most obvions way to do this is to assnme that 


E(yj7|^,xj,tp) = , 


where Q : x 0 —yij is a known fnnction. If ^(u, 0) ^ I^u, then Ramsay et ah (2007) call this a 

distribnted partial data problem. 


As an example, consider the hnman placenta example introdnced in Section p72 
ODEs is given by 


Uiit) 

U2{t) 

Ml(0) 

M2(0) 


Xl(u2(t)+0264)-Ui{t){x2+9293) 
u* {u,t,6,x) 

X2{ui{t)+9294)-U2{t)(xi+9293) 
u* {u,t,6,x.) 

Uqi 

Uo2 


> t G [To, Ti], 


The system of S' = 2 


( 3 ) 


where 


u* {u(t),t, 0, x) 


2{xi + X 2 ){ui{t) + U 2 {t)) (1 62 ) (6'4(Xi -f X 2 ) + + U 2 it))) + 29364 

9i 


and To = 0 and Ti = 600 seconds. The solntion to this system is u(t) = {ui{t), U 2 it)) which are 
the amonnts of radioactive and non-radioactive serine, respectively, inside the vesicle at time t. The 
valnes of x = {xi,X 2 ) E X = [0,1000]^ are the amonnts of radioactive and non-radioactive serine 
ontside the vesicle at time f = 0, and uq = (moi, ^ 02 ) G [0,1000]^ are the corresponding amonnts of 
serine inside the vesicle at time f = 0. In the experiment, we are able to control X 2 and U 02 (i-e. xi 
and Moi are fixed by the experimental protocol), and the response, yji, is the amonnt of radioactive 
serine inside the vesicle at time tji for process conditions X 2 j and Uo 2 j- We assnme a statistical model 
where 

E(l/p|0,Xj,tp) = 

so that in this case ^(u, 0) = Ui. The design is the collection of initial amonnts of non-radioactive 
serine X 2 j and Uo 2 j, and the observation times tji,..., tjn^ for j = 1,..., M. 
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2.2 Decision-theoretic approach to Bayesian optimal design 


We now describe the decision-theoretic approach to Bayesian optimal design. We complete the 
statistical model given by F in ([^ by specifying a prior distribution for which does not depend on 
the design d. Once we observe y, the posterior distribution of i/’ is given by 

7r(t/?|y, d) oc 7r(y|7/?, d)7r(7/?), (4) 

where 7r(y|'i/?,d) is the mass/density function of F and 7r('i/?) is the prior density function. Note that 
the right-hand-side of Q also dehnes the joint distribution of t/? and y. The posterior distribution 
of the physical parameters is found by marginalising (|^ with respect to the nuisance parameters. 

Bayesian optimal design relies on the specihcation of an appropriate (for the goal of the experiment) 
loss function denoted by A('i/?,y,d) which depends on the design and, potentially, the unobserved 
responses and unknown parameters. An optimal design, d*, is given by a value of d that minimises 
the expectation of A('i/?,y,d) with respect to the joint distribution of i/? and y (given d), i.e. 

d* = argminL(d), 
dec 

L(d) = E(A(tA,y,d)|d), 

= J A(tA,y,d)dP^,y|d. 

For example, suppose we were interested in posterior point estimation of the elements of 0, then an 
appropriate loss function might be the squared error loss (SEL) given by 

AsEL(V^,y,d) = {01 - E(0;|y,d))^ (5) 

1=1 

which does not depend on the nuisance parameters. It can be shown that 

-^SEL (d) = E(AsEL(V>,y,cl)|d), 

= j tr(var(0|y,d))dPy|d, 

and so the optimal design minimises the expected trace of the posterior variance matrix of 6. 

As mentioned in Section we are faced with three problems when trying to minimise the expected 
loss function; 


• high dimensionality of the design space, V] 

• intractability of the integration required to evaluate T(d) and A (-0, y, d); 

• both evaluation of A(0,y,d) and the joint distribution of y and 0 will typically depend on 
the intractable solution to the system of ODEs. 


The approximate coordinate exchange (ACE) algorithm, proposed by Overstall and Woods (2015), 
is a solution to the hrst two problems and is described in the next section. 
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2.3 Approximate coordinate exchange algorithm 


To overcome the problem of the high dimensionality of the design space, the coordinate exchange 


(CE; Meyer and Nachtsheim 1995) algorithm is used to minimise the expected loss function, -h(d). 
This is the same as a cyclic descent algorithm where T(d) is minimised, sequentially, over each 
element (or coordinate) of the design space, where all other elements are held fixed. This process is 
then repeated until convergence. 

However, instead of minimising the intractable T(d), at each iteration, the ACE algorithm minimises 
an approximation, T(d). Consider the Monte Carlo (MC) approximation to T(d): 


1 


B 


LB(d) = p 


( 6 ) 


i=l 


where is a sample generated from the joint distribution of i/? and y, given d. The Ts(d) 

is a consistent and unbiased estimator of T(d). However, there are at least two reasons why ^^(d) 
would be a poor choice for T(d). First, Lsid) is a stochastic approximation and so is a non-smooth 
function. Secondly, Lsid) is computationally expensive. This problem is aggravated by the fact 
that, in some cases, the loss function is, itself, an intractable function requiring MC approximation. 


Instead, Overstall and Woods (2015) proposed constructing an approximation (or emulator) for T(d) 
based on a “small” number of evaluations of Lsid). One of the most common types of emulator 
is the Gaussian process (GP) emulator. The use of GP emulators in more general optimisation 


problems dates back to, at least, the expected improvement approach of Jones et ah (1998). The GP 


emulator provides a predictive distribution for T(d) and we set T(d) to be the predictive mean of this 
distribution. This approach of smoothing the MG approximations can be seen as an extension to the 


approach of Muller and Parmigiani (1996), for minimising the expected loss, to higher dimensionality 


design spaces, by using the GE algorithm. A similar approach was used by Gotwalt et ah (2009) 


who used a deterministic quadrature rule to evaluate the log determinant of the Fisher information 
matrix averaged over a prior distribution (a commonly-used classical objective function for optimal 
design) and then maximised the resulting approximation over the design space using GE. 

The actual algorithm is provided in Appen dix [A Note that a GP emulator, like all statistical models, 
can £t inadequately. Bastos and O’Hagan (12009 ) developed diagnostics to assess the adequacy of GP 


emulators. However, applying these methods automatically within the AGE algorithm is infeasible. 
Instead, once T(d) has been minimised, we, independently of the GP emulator, decide whether to 
accept the change to the current design before we move onto the next element/coordinate in the 
AGE algorithm. This accept step is accomplished using a Bayesian hypothesis test. For more details 
on this feature, on the overall AGE algorithm, and on the wider issue of Bayesian optimal design. 


see Overstall and Woods (2015). 


3 Methodology 


In this section we describe how the AGE algorithm can be extended to find Bayesian optimal designs 
for ODE models. 


In Section 3.1, we briefly describing the method of Ghkrebtii et ah (2015) for hnding a probabilistic 
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solution to a system of ODEs and then, in Section |3.2 
in the ACE algorithm. 


describe how this solution can be embedded 


3.1 Probabilistic solution to ODEs 


Let Rx denote a square integrable kernel function with length scale parameter A G (0, oo). Further¬ 
more, let 


Qxitiyh) 

Sx{tl,t2) 


WxitiM) 


V\{tiR2) 


a 


a 


a 


^ Rx{s,t2)ds, 

a 

/ oo 

Rx{sRi)Rx{sR2)<^s, 

■OO 

/ OO 

Qx{sRi)Rx{sR2)<^s, 

■OO 

poo 

/ QA(s,ti)QA(s,f2)ds, 


where a > 0. Let u(f) = ... ,us{t)), then the central assumption of the method of Chkrebtii 


et al. (2015) is that Us{t) and its time derivative, 'Us(f), have a joint GP prior, i.e. 


iisi-) 

Us{-) 


~ GP 


rhs{-) 

ms{-) 


lPA(-,-r Vx(;-) 


(7) 


independently, for s = The probabilistic solution to the system of ODEs given by ([^, 

conditional on 6, a and A, is constructed by updating the joint distribution given by ([^ sequentially 
over a discrete grid of N time points r = (ri,..., tn), where Tq = ti < T 2 < ■ ■ ■ < = Ti. 

Let Ti:r = ('Ti, • • •, Tr) be the r X 1 vector of time points up to and including the rth time point r^, 
for r = 1,..., A^. The algorithm for the sequential update is as follows: 


1. Compute the S' x 1 row vector fi = f (To,Uo,0) giving the gradient at the initial time point. 
To = Ti, and let Ai = 0. 

2. For r = 1,..., A^ — 1 complete the following steps: 

(a) Compute the S' x 1 vector giving the predictive mean of u(rr+i) as 

m(r^+i) = uo + 1Pa(a+i, Ti,^) (Sa(ti:^, Ti,^) A^)"^ F^, 

where is the r x S' matrix with gth row given by fg for q = I,... ,r. Compute the 
common predictive variance of Us{Tr+i) as 

Tj.-)-i) Va(Tj.+1, Tj--|-i) 1Ta(Tj.+1, T 1:j.) (Sa(t l:r5 l:r) T Aj.) 1 Ta(A ’+15 l:r) • 

(b) For s = I,..., S, generate a prediction of the solution at r^+i, Ms(a-+i), from the predictive 
distribution, i.e. 

Us{Tr+l) ~ N {ms{Tr+l), C^Tr+l, T^+l)) , 
where ms{Tr+i) is the sth element of m(rr+i). 
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(c) Compute the true gradient vector at time r^+i and solution u(rj,+i) as 


fr+l = f {Tr+l, u{Tr+l), 0) . 

(d) Compute the common predictive variance of the time derivative 

C(’tr+l) "^r+l) (Tj-, TfTi:}.) “1“ Aj.) S'Ti-r') , 

and let A^+i = diag < A^, C'(rr+i, • 


Once step 2 is complete, then a probabilistic solution is given by 


n.(-)~GP K(-),C(-,-)), 

for s = 1,..., S, where is the sth element of the vector of predictive means given by 

m(-) = uo + Wx{-, t) {Sx{t, t) + An)~^ Fat, 
and C'(-, •) is the common predictive variance given by 

c(-. ■) = n(-. •) - in(-, r) (s,(x. t) + Aa,)-‘ w,(; tY. 


The methodology holds for general kernel functions. However, Chkrebtii et ah (2015) suggest two 


example kernel functions, the squared exponential kernel function given by 


Rx(ti,t 2 ) = exp 


(^1 


0)2 


2 A 2 


and the uniform kernel function given by 


-Ra( 0;0) — -f (O £ (0 ~ + -^)) , 

where I{A) is the indicator function for the event A. Simplistically, the best choice of kernel function 
depends on the assumed smoothness of the u(t). The squared exponential kernel function is infinitely 
differentiable and can be used for when we expect u(t) to be smooth. On the other hand, if u(f) 
is non-smooth then a better choice is the uniform kernel function which is not differentiable. For 
closed form expressions for the functions Qx, Sx, Wx and Vx, under both the squared exponential 


and uniform kernel functions, see Chkrebtii et al. (2015). 


Consider the system of ODEs, given by ([^, that describe the transport of serine in a human placenta. 
Under the squared exponential kernel function, physical parameters 6 = (100,0.05,100,100), initial 
values uq = (0,1000), treatment x = (7.5,1000), and r containing N = 501 evenly spaced time 
points. Figure shows 1000 draws from the probabilistic solution of Ui{t) and U 2 {t) plotted against 
t G [TojFi] = [0,600]. Note how the uncertainty in the solution increases as time, t, increases away 
from t = Tq where we know, in this example, the true value of u(f). 


Chkrebtii et al. (2015) propose several MCMC algorithms for generating a sample from the posterior 


distribution of the model parameters xjx = (0, 7 ) and the auxiliary parameters a and A, given existing 
experimental responses y. This is accomplished by embedding the probabilistic solution into standard 
MCMC algorithms. 














Figure 1: Plots showing 1000 draws from the probabilistic solution of ui{t) and U 2 {t) against t for the system 
of ODEs, given by (§, that describe the transport of serine in a human placenta 




3.2 Extending the ACE algorithm to ODE models 


At various points of the ACE algorithm we are required to evaluate a Monte Carlo approximation 
of the expected loss function, i.e. as given by ([i). First, this requires a sample to be 

generated from the joint distribution of y and i/’- This is accomplished by generating a value "0^ from 
the prior of 0 and then generating a value y* from the distribution F(0j, d) (see equation (|^). This 
will require B evaluations of the intractable solution of the system of ODEs, for the jth treatment 
and time points: tji,..., tjn^ for j = 1,..., M. Secondly, we need to evaluate the loss function at 
each value in the sample {yi,0i}^i- Unfortunately, the most commonly used loss functions are, 
themselves, intractable. For example, the squared error loss function given by (|^ depends on the 
posterior mean, E(0;|y), of each element of the vector of physical parameters, 0. 

Woods (2015) used a Monte Carlo approximation of the posterior mean as follows 


Overstall and 


E(0z|y) = 


Ej=i ^jW(y|0^. 

Ef=i7r(y|0,) 


where 


is an additional sample generated from the prior distribution of 0 where 0,- = 


i=i 

and 9ji is the Ith element of 0j for I = 1,... ,p. Evaluation of E {0i\yi), in the loss function, 

for i = 1,..., 5, is now replaced by evaluation of E {6i |yi) to give a nested Monte Carlo approximation 
to the expected loss function. A further B evaluations of the intractable solution, u(f), will be 
required for each treatment and for each time point. Therefore, in total, we need 2B evaluations of 
u{t) for each treatment and for each time point, one for each vector of physical parameters in the 

samples {0*}^]^ and |0j| • Thus we are required to evaluate 


Uiji = u(07;0i,x0. 


r 


for i = 1,..., 2B, j = 1,..., M and I = 1,... ,n 

The basic idea is to replace evaluation of the unknown u(f) in the ACE algorithm by a value generated 
from the probabilistic solution outlined in Section 3.1[ However, since B will be in the order of 1000s, 


9 














it will be computationally infeasible to use the full probabilistic solution where both the physical 
parameters, 6 , and auxiliary parameters, a and A are unknown with prior distributions. However, if 
we are prepared to £x the values of the auxiliary parameters, then signihcant computational savings 
can be found thus making the method feasible. 


To see this note that we can rewrite the predictive mean of u(rr+i) in step 2(a) in Section 3.1 
follows 

m(r^+i) = uo 


as 


ajF, 


where a,. = (S'a(ti.,,, ti:^) + is a r x 1 vector which does not depend on 6. Also 

note that, in step 2(a), the common scalar predictive variance of Us{Tr+i), for s = 1,..., S', denoted 
as Cr = C{Tr+i, Tr+i), also does not depend on 0. This means we can pre-compute both a,, and C^, 
for r = l,...,A^ — 1, in advance of running the ACE algorithm. 


In summary, before starting the ACE algorithm, an initial phase is completed as follows. 


Initial phase 


1. Set Ai = 0. 

2. For r = 1,..., A^ — 1 compute the following quantities: 

ar (Tj.+1 , Ti:j.) , 

Cr = Vx{Tr+l,Tr+l) - IEa(t^+i, Ti,^) (S'a(Ti,^, Ti,r) + K)~^ W^A(t^+I, Ti,^)'^, 

A^+i = diag |A^,C'(rr+i,' 

where 

= (S'a(ti:^,T1:^) A^)"\ 

3. Compute Bat = (S'a(t, t) -|- Aat)”^. 

4. For j = 1,, M, compute the Uj x N matrix 

Bj = Wx{tj,T)BN, 

and the nj x nj matrix 


Ej = Vx{tj,tj) - Wx{tj,T)BNWx{tj,T) 


Now, embedded in the ACE algorithm, we can produce a probabilistic solution for Uiji for i = 
1,..., 2B, j = 1,..., M and / = 1,..., n^, as follows 

Main phase 


1. Let fi = f(To,Uoj,0i). 

2. For r = l,...,A^ — 1, complete the following steps. 
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(a) For s = 1,..., S', generate 
where 

(b) Compute 

3. For s = 1,..., S', generate 
and let 


Usijr+l) ~ N {ms{Tr+l), Cr) , 
m(r^+i) = Uoj + a^F^. 

fr+l f ("^r+l) u(Ty-|_i), Oi'j. 

U* ~ N {Uojs + DjFtv, Ej) , 
Up7 = {uh, . . . , Ugi) , 


where u*i is the sth element of u*. 


In the above algorithm note the dependence on the initial values Uq^, for the jth treatment, i.e. the 
initial values are known. In some situations, the initial values will be unknown and become part of the 
inference problem, i.e. they are given a prior distribution which we update to a posterior distribution 
in light of the experimental responses. If that is the case, then we can replace all occurrences of uoj 
by a value, uojj, generated from their prior distribution, as we do with the unknown parameters. 


4 Illustrative Examples 

4.1 Preliminaries 


In this section we demonstrate the extended ACE algorithm on three illustrative examples featuring 
systems of ODEs: 


1. Compartmental model (Section [4^ ; 

2. FitzHugh-Nagumo equations (Section |4.3[ ); 

3. JAK-STAT mechanism (Section |4.4[ ). 


In each example, designs are found under the three different loss functions as described below. For 
each loss function, let i where is an additional sample generated from the 

prior distribution of i/’- 


Squared error loss (as described in Section 2.2). 
Absolute error loss (AEL) given by 


AAi;L(y,e,d)=^|9,-M(9,|y)|, 


1=1 
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where M(0;|y) is the posterior median of 6 i. The posterior median is intractable and is approx¬ 
imated as follows. For j = 1,... ,B, let 


Wj = 




Ef=i 7r(y|V’i 


and let ^((i) <•■■< 6 *^( 5 ) be the ordered values of 61 in the sample 
mation to the median is given by 



Then an approxi- 


M{9i\y) 



9i(z) + 9, 




1 


where z = max {j = 1,..., B\wj < 1/2}. We then approximate the absolute error loss function 
by replacing M( 6 '/y) by M{9i\y). 

• Self-information loss (SIL) given by 


AsiL(V’,y,d) = log7r(0) - log7r(0|y,d), 

= log7r(y|d) - log7r(y|0,d), 

= log j 7r(y|0,7,d)7r(7,0)d7d0 - log j 7r(y|0, 7 , d)7r(7)d7. ( 8 ) 

Both of the integrals in (|^, denoted as 


Ji = y 7r(y|0,7,d)7r(7)d7, 

= j 7r(y|6>,7,d)7r(7,0)d7d6/, 


are analytically intractable. However we can approximate them using Monte Carlo integration 
as follows. 

1 ^ 

h = 

j=i 
1 ^ 
j=i 

We then replace evaluation of Ji and I 2 in the self-information loss function by Ji and J 2 , 
respectively. 


Similar to approximating the squared error loss in Section 3.2, both of the nested Monte Carlo 


schemes for approximating the absolute error and self-information loss require 2B evaluations of u(t) 
for each treatment and time point, which we now replace by a value generated from the probabilistic 
solution. For the probabilistic solution, the discrete grid of points denoted by r will be a set of N 
equally-spaced points where the absolute difference between any two values is denoted by h. Unless 
stated otherwise, for the auxiliary parameters, A = Ah. 
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4.2 Compartmental model 


Compartmental models are used in pharmokinetics to understand how drugs behave inside a body. 
The open one-compartment model with hrst-order absorption is described by the following system 
of S' = 2 ODEs for t G [0, 24] hours 

ui{t) = -6iUi{t) 

U2it) = i02/03)ui{t) - e2U2it) (9) 

u(0) = (T>,0) 

where Ui{t) and U 2 it) are the amounts of drug outside and inside the body respectively, D is the 

known initial dose and 6 = {6i, 62, O2) are unknown parameters. 

The system in ([^ is a homogenous linear system with constant coefficients meaning it can be solved 
analytically as 


Ui{t) = Dexp{—9it), 

DO 

“2(^) = (exp(-0if) - exp(- 02 t)) • 


For this example, we find and compare designs under the three loss functions from Section 4T, and 
under both the exact and probabilistic solutions to the ODEs. 


This type of compartmental model (or variants of it) is often used to demonstrate optimal experi¬ 
mental design methodology (see, for example, Atkinson et ah 1993 Gotwalt et ah 2009 Ryan et ah 
Overstall and Woodsj2015). We follow the setup of|Ryan et ah (2014) and Overstall and Woods 


2014 


(2015) where D = 400, n = 15 and 


log^z ~ N(/ii,0.05), 


independently, for / = 1, 2, 3 with (/xi, /i 2 , /is) = (log 0.1, log 1, log 20). The amount of drug inside the 
body, Ui is observed at observation time ti and we assume the following statistical model 


Vi 


N {U2{ti),a‘^ T^U2{ti)^) , 


( 10 ) 


independently, where cr^ = 0.1 and = 0.01. Note that (10) implies that Q{u,6) = U2- The design 
only involves the n observation times: ti,... ,tn- An added stipulation is that the observation times 
have to be at least 15 minutes apart. Such constraints are straightforward to incorporate into the 
ACE algorithm (see Overstall and Woods 2015). 


Since we know that u(/:) is smooth we employ the squared exponential kernel function. The discrete 
grid, T, is of size N = 501. The auxiliary parameter a is fixed at N. 


For each loss function, we compare the designs found under the exact and probabilistic solutions 
(using ACE) to a uniform designs of n = 15 equally-spaced time points in [0, 24] hours. The top row 
of Figure shows boxplots of twenty evaluations of the Monte Carlo approximation to the expected 
loss for the uniform design and the optimal design found under each of the loss functions. There 
is negligible difference between the designs found under the exact and probabilistic solutions and 
these designs are clearly superior to the uniform designs. In the bottom row of Figure are the 
observation time points associated with the designs under comparison. The optimal designs appear 
to favour having a set of observation times near the peak of U 2 {t) at f ~ 2.5 hours and then a 
series of observation times at the end of the observation interval. Of the points near the peak, the 
optimal design under self-information loss has two distinct sets whereas the designs under squared 
and absolute error loss have just one set of points. 
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Figure 2: Plots summarising the results from the compartmental model in Section 4.2 The top row show 


boxplots of 20 evaluations of the Monte Carlo approximation to the expected loss for the uniform design 
and the optimal designs (for the exact and probabilistic solution) found under each of the loss functions. 
The bottom plot shows the three designs found under each of the loss functions and the uniform design. In 
the background to the plot in the bottom row is 100 draws from the exact solution, U 2 {t), giving the amount 
of drug at time t, for 100 values drawn from the prior distribution of 6. 

Self-information loss Squared error loss Absolute error loss 





LD 

Cvj 


■D 

O 


O 

E 

< 



Time (h) 


X SIL-exact v SEL-exact o AEL-exact • Uniform 

+ SIL - approximate ^ SEL - approximate O AEL - approximate 


14 























4.3 FitzHugh-Nagumo equations 


The FitzHugh-Nagumo equations (FitzHugh 1961 and Nagumo et ah 1962) aim to describe the 
behaviour of spike potential in the giant axon of squid neurons. They are given by the following 
system of S' = 2 ODEs for f G [0, 20] ms 


Ul{t) = Osiuiit) -Ui{t)^/3 + U 2 {t)) 
U 2 {t) = - {ui{t) - 6 i + 62 U 2 {t)) /Os 

u(0) = (-1,1) 


where Ui{t) is the voltage across the axon membrane, U 2 {t) is the recovery variable giving a summary 
of outward current and 6 = ( 6 *i, 6 *i, 6 * 3 ). 


The experiment involves measuring the voltage, yi, at time ti, for i 


Ramsay et ah (2007), the following statistical model is assumed 


1,..., n = 21. Following 


l/i ~ N , 


( 11 ) 


independently, where a ~ U[l/2,1]. Furthermore, we assume the following prior distributions for 
the unknown parameters: 9i ~ U[0,1], 62 ~ U[0,1] and 6^3 ~ U[l, 5]. In ( [II| ), ^(u, 6) = Ui. 


As noted by Ramsay et ah (2007), the solution to the FitzHugh-Nagumo equations can alternate 
between smooth evolution and sharp changes of direction. For this reason we employ the uniform 
kernel. The discrete grid is of size N = 200 and the auxiliary parameter a is hxed as N. 


The design consists of the n observation times: ti,... ,fn- Similar to Section |4^ , we stipulate that 
the observation times have to be at least 0.25ms apart. We hnd designs under each of the loss 
functions in Section |4.1[ In each case, we compare the optimal design to a uniform design of n 
equally spaced points in [0, 20]ms. Figure]^ shows boxplots of twenty evaluations of the Monte Carlo 
approximation to the expected loss for the uniform design and the optimal design found under each 
of the loss functions. In each case, there is a clear improvement to be made on using a uniform design. 
Also shown in Figure]^ are the four designs under comparison. Both the squared and absolute error 
optimal designs have a signihcant number of frequent observations at the beginning of the experiment. 
For example these designs have around a third of their observation times before 2.5ms. On the other 
hand, the self-information loss and uniform designs only make 3 observations before this time. A 
feature of all of the optimal designs is that they make no observations between about 2.5 and 6 ms. 
The remaining observation times appear roughly evenly spaced. The background to this plot shows 
100 draws from the probabilistic solution, giving the voltage at time t, for 100 values drawn 

from the prior distribution of 0. It appears that the initial phase of high frequency observations is to 
learn about the steep increase in voltage for small t. The remaining set of evenly spaced observation 
times is for learning when the voltage has high noise due to parameter uncertainty. 


4.4 JAK-STAT mechanism 

The JAK-STAT mechanism describes a set of biochemical reactions of STAT-5 transcription factors 
in response to binding of the Erythropoietin hormone to cell surface receptors. The system of S' = 4 
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Figure 3: Plots summarising the results from the FitzHugh-Nagumo equations in Section 4.3 The top row 


show boxplots of 20 evaluations of the Monte Carlo approximation to the expected loss for the uniform 
design and the optimal design found under each of the loss functions. The bottom plot shows the three 
designs found under each of the loss functions and the uniform design. In the background to the plot in the 
bottom row is 100 draws from the probabilistic solution, ui{t), giving the voltage at time t, for 100 values 
drawn from the prior distribution of 6. 
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ODEs, for t G [0, 60], is 


iiiit) = —9iUi(t)n(t) + 294U4(t — oj) 

U2(t) = 9iUi{t)Kit) - 92U2(tY 

Usit) = -93U3{t) + ^92U2itf 

itiit) = 93U3{t) — 94U4{t — u) 

u(0) = (moi, 0,0,0) 


where Mqi and u are unknown, and u^lt) = 0 for t G [—ca, 0]. The examples thus far have been initial 
value problems (IVPs) whereas the system above is an example of a delay initial function problem 
(DIFP). After gene activation in the cell nucleus, the transcription factors revert to their initial state, 
returning to the cytoplasm for the next activation cycle. This stage is explained by the unknown 
time delay denote by oj. 


The function Q is given by 


Q{n,e) 


^ 9 ^{ u 2 + 2^3) ^ 

9q{ui +U 2 + 2 U 3 ) 

Ml 

\ U3/{U2+U3) 


An experiment has already been conducted (Swameye et ah, 2003) 


n = 16 measurements of the hrst two elements {Qi and Q 2 ) of ^ at a 
ti,... ,tn- A further observation of Q 3 is made at time t = 0 and 
statistical model is assumed 


which consisted of two series of 
set of distinct observation times: 
of Q 4 at time t*. The following 


{yu,y 2 i) ~ N(^(u(fi),0)i:2,Ci) , 
2/3 ~ N (^(u(fi),0)3,a3) , 

2/4 ~ N (^(u(/:i),0)4,a4) 


independently, for z = where Cj = diag {a a^i}. 

) for analyses of these data. In these papers, 
experimentally determined, i.e. they are known. 


et al. (2015 


See Pane et al, 


a. 


2ii 


(|2009|) and Chkrebtii 

nj 


(J 3 and (J 4 (for z = 1 ,. 


are 


The design task considered here will be to optimally hnd a follow-up design based on information 
on the parameters from the existing data. We assume the same statistical model as above and hnd 
ti,... ,tn and t*. In the terminology of Sections and the prior distribution for 0 , u and zzqi is 
the posterior distribution for the existing data as per the analysis of Chkrebtii et al. (2015). Instead 


of the variance parameters being hxed, we assume that afj 

0-1 

~ U[0,0.1] 

<72 

~ U[0,0.1] 

<73 

~ U[0,20] 

(74 

~ U[0,0.1]. 


These prior distributions are consistent with the experimentally determined values from the original 
analysis. 
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Note that the system of ODEs depends on the forcing function K{t). This is unknown, but has been 
measured at 16 time points. Following Chkrebtii et ah (2015), we assume these measurements are 


without error and use a GP to give a probabilistic approximation to K{t) at any value of t G [0, 60]. 
The nature of the DIFP does introduce an added complexity to our implementation of the proba¬ 


bilistic solution. In step 2(b) of the main phase of the algorithm in Section 3.2 we are required to 
compute 

fr+l f ("T-l-l) u(T’-I-i) ) ) • 

In an IVP, this is dependent on u(rr_|_i) which is generated from a normal distribution in step 2(a). 
However in the DIFP, to compute G+i, we also need u^^Tr+i — tUj), where Ui is a value generated from 
the prior (posterior from original analysis) distribution of u. If r^+i — cjj < 0, then u^^Tr+i — oJi) = 0 
by the initial conditions of the system of ODEs. For r^+i — cuj > 0, in the probabilistic solution 


of Chkrebtii et ah (2015), the conditional distribution of M 4 (r,.+i — oJi) can be derived and a value 
generated. However, this will be computationally expensive to incorporate in the implementation 
of the probabilistic solution described in Section 3^ Instead, if r^+i — oJi > 0, then we replace 
U 4 {Tr+i — OJi) by U 4 {Tf) where 


r = arg mm +1 — ooi — Tr 


i.e. from the series of M 4 (ri),.. 
time Tf that is “closest” to r^+i 


,U 4 {Tr+i) generated in step 2(a), thus far, we choose the value at 

- OJi. 


As noted by Chkrebtii et ah (2015), the time delay can cause discontinuities in the derivative meaning 
we emply the uniform kernel function. The discrete grid, r, is of size N = 500, and the auxiliary 
parameters are A = 0.085 and a = 8000 which are consistent with their posterior distribution from 
the original analysis. 

We use the extended ACE algorithm to generate designs under the three loss functions from Sec¬ 
tion 4.1 and compare these designs to the original design used in the experiment of |Swameye et ah 
(2003). As in the previous examples, the observation times need to be at least 1 second apart. Fig- 


ure 1^ shows boxplots of twenty evaluations of the Monte Carlo approximation to the expected loss for 
the original design and the optimal design found under each of the loss functions. In each case, there 
is a clear improvement to be made on repeating the experiment with the same set of observation 
times. Also shown in Figure are the four designs under comparison. Clearly the optimal designs 
favour having a dense set of points early in the observation window and then a smaller set of times 
at the end of the window. This is especially true for the designs under the squared and absolute 
error loss where 75% of the observation times are less than 15 seconds, compared to about 60% for 
self-information loss and 50% for the original design. It appears that the early observation times 
can learn about the peak in Qi and the sharp decrease in Q 2 at and up to 10 seconds, respectively. 
For the single observation time, t*, of Q 4 , the optimal designs clearly favour making a very early 
observation. Note that t* for each of the optimal designs is between 1 and 2 seconds. 


5 Application: Transport of serine across human placenta 


Now consider the human placenta example introduced in Sections PI and |2.1[ By the protocol of 


the experiment, the initial amounts of radioactive serine inside (moi) and outside (xi) are hxed as 0 
and 7.5, respectively. 
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Figure 4: Plots summarising the results from the JAK-STAT example in Section 4.4 The top row show 


boxplots of 20 evaluations of the Monte Carlo approximation to the expected loss for the original design and 
the optimal design found under each of the loss functions. The bottom plot shows the three designs found 
under each of the loss functions and the original design. In the background to the plot in the bottom row is 
100 draws from Q\^ Q 2 and ^ 4 , at time t, for 100 values drawn from the prior distribution of 6, uqi and lo. 
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We consider M = 2,... ,7 placentas and let rij = n* = 8, for all j = 1,..., M. We also fix tji = ti 
for I = 1,... ,nj and j = 1, • •., M. The following statistical model is assnmed for the experimental 
responses 

Vji = + eji, ( 12 ) 

for j = 1,..., M and / = n*, where Xj = (xi, X 2 j), 

and 0j are the physical parameters for the jth placenta. For d = 1,... ,p, we specify a mnltiplicative 
hierarchical prior strnctnre for 6j as 

where q ~ U [0,0.05], and 6 = (6*i ,... ,6p) are the popnlation physical parameters. For the latter, 
we assnme the following independent prior distribntions for each element 

9d ~ TYi[ai,bi], 

where Tri[a,6] denotes the symmetric triangle distribntion on the interval [a,b]. We assnme oi = 
02 = 04 = 80, 6i = &2 = ^4 = 120, 02 = 0.02 and &2 = 0.08. This reflects actnal prior knowledge on 
the valne of these parameters from past experiments. For the response variance, we assnme 

~ U[0,1]. 


We expect the solntion to be smooth so nse the sqnared exponential kernel fnnction. The discrete 
grid, T, is of size N = 601 and the anxiliary parameter a = lOA^. 


Specifying a design corresponds to specifying the M = 7 experimental conditions 0 : 21 , • • • ,X 2 m, and 
initial valnes M 021 , • • •, uo 2 M, as well as the common n* = 8 observation times fi, ..., tn*- Therefore, 
the design space has 22 dimensions. 


For each valne of M, we nse the ACE algorithm to hnd designs nnder the three loss fnnctions from 
Section |4.1[ In addition, snppose a qnestion of interest concerns whether the reaction rates are 


symmetric, i.e. is 63 = 6 ^ 4 ? To answer this qnestion, we dehne two models: mi (where 63 = 64) and 
m 2 (where 6*3 7 ^ 64). An appropriate loss fnnction, termed the Model selection loss (MSL), is 


A(y, m, d) = 1 — /(m = m). 


where m E M = {mi, m 2 } denotes the model and m = argmaXmeA 47 r(m|y) is the model that 
maximises the posterior model probability. The posterior model probability of model m is given by 


7r(m|y) 


7r(y|m)7r(m) 


where 7r(m) is the prior model probability of model m and 




(13) 


for j = 1,2, with 0^ being the parameters nnder m,-. Under j = 1,2, the integration in (13) is 


intractable bnt can be approximated nsing a Monte Carlo approximation in a similar way to I 2 
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Figure 5: Plots summarising the results from the placenta example in Section Shown are boxplots of 20 
evaluations of the Monte Carlo approximation to the expected loss for the proposed design and the optimal 
design found under each of the loss functions and each value of M. 
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Table 2: Initial concentrations (to nearest integer) of non-radioactive serine inside {uq 2 = '^^ 2 ( 0 )) and outside 
(X 2 ) each of the M = 7 placentas for the optimal designs found under the four loss functions and the 
proposed design. 



Self-information 

Squared 

error 

Absolute 

error 

Model selection 

Proposed 

Placenta 

X2 

U02 

X2 

U02 

X2 

U02 

X2 

U02 

X2 

um 

1 

0 

1000 

0 

0 

0 

0 

0 

0 

0 

0 

2 

0 

1000 

0 

0 

0 

0 

0 

105 

250 

0 

3 

0 

1000 

0 

58 

0 

40 

0 

169 

250 

250 

4 

0 

1000 

0 

56 

0 

67 

302 

0 

250 

1000 

5 

0 

1000 

217 

952 

188 

932 

306 

457 

1000 

0 

6 

0 

1000 

215 

1000 

184 

1000 

755 

1000 

1000 

250 

7 

0 

1000 

194 

1000 

207 

1000 

570 

117 

1000 

1000 


under the self-information loss. In each case, as in the previous examples, the observation times need 
to be at least 5 seconds apart. 

Figure]^ shows boxplots of twenty evaluations of the Monte Carlo approximation to the expected loss 
for the optimal design found under each loss function plotted against M = 2,..., 7. For M = 7, also 
shown is a boxplot of twenty evaluations of the Monte Carlo approximation for a design (see Table 
proposed by the biologists at the Southampton Centre for Biological Sciences. For M = 2,... , 6 , 
the corresponding boxplots show evaluations of the Monte Carlo approximation for twenty designs 
formed by randomly choosing M rows from the proposed design. As expected, the expected loss 
decreases with the number of placentas, M. Also, the optimal designs are clearly superior to the 
proposed designs. However, it should be noted that, for each loss function, the optimal design is 
expected to gain more information from M = 2 placentas than the proposed design is expected to 
do so from M = 7 placentas. 

We now look at the case of M = 7 placentas in more detail. The initial concentrations of non¬ 
radioactive serine inside (^ 02 ) and outside {X 2 ) each of the M = 7 placentas for the optimal designs 
found under each of the four loss functions are shown in Table IH Also shown are the conditions 
for the design proposed by the biologists. Figure show the optimal (for each loss function) and 
proposed observation times. Each row shows 100 draws from Ui{t) plotted against t for 100 draws 
from the prior distribution of 6 and 7 , for each placenta and initial concentrations shown in Table 

Under the self-information loss. Table shows that the optimal design has replicates of the same 
initial concentrations of non-radioactive serine. Figure shows that, compared to the other designs, 
these initial concentrations lead to a larger maximum value of Ui{t). Note that the between-placenta 
variability of Ui{t) in Figure]^ is due to the placenta-specihc physical parameters, 6j. The optimal 
observation times under the self-information loss occur in two clusters, just before and after the peak 
in ui(t). 

The designs under the squared and absolute error loss functions appear similar. In both cases they 
are superior to the proposed design. The initial concentrations in Table lead to three distinct 
prohles of Ui{t) (Placentas 1 and 2; 3 and 4; 5, 6 and 7). The prohle for placentas 1 and 2 has a 
slow steady increase in ui{t) with respect to t. Placentas 3 and 4 have a steep initial increase and 
subsequent decrease in ui{t) with respect to t. Finally, placentas 5 to 7 has a steep initial increase in 
Ui{t) with respect to t followed by a slow decrease. The optimal observation times are predominantly 
at the beginning of the observation window. 
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Figure 6: Plots summarising the results from the placenta example in Section Shown are 100 draws from 
ui{t) plotted against t for 100 values drawn from the the prior distribution of 6, for each of the M = 7 
placentas and experimental conditions given by Table 
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The initial concentrations of the optimal design under the model selection loss result in two distinct 
prohles of For placentas 2, 3 and 5 to 7, Ui{t) has a slow steady increase in Ui{t) with respect 

to t. Placentas 1 and 4 have a steep initial increase and subsequent decrease in Ui{t) with respect 
to t. Opposite to the other loss functions, the optimal observation times are predominantly towards 
the end of the observation window. 


6 Concluding Remarks 


This paper introduces an extension of the ACE algorithm for Bayesian optimal design so that the 
challenging problem of experimental design for ODE models can now be attempted. 


The method relies on the probabilistic solution to a system of ODEs as recently proposed by Chkrebtii 


et ah (2015) and is demonstrated on four examples where the goal of the experiment is estimating 


unknown physical parameters. 


One key issue not addressed is model discrepancy (Kennedy and O ’Hagan, 2001). This is a systematic 


mis-match between the true physical process and the solution to the ODEs. Not taking account of this 


bias can lead to significant bias in the posterior estimates of the physical parameters (Brynjarsdottir 


and O ’Hagan, 2014). Future work will focus on Bayesian optimal design for physical models including 


model discrepancy. 


A The ACE algorithm 

1. Choose an initial design d° = ■ ■ ■) set the current design to be d*" = (df,..., d^) = 

dF 

2. For each element i = 1,..., g of d; 

(a) Let T*(d) = T(df,..., df_i, d, d^^,..., d^) be the function given by the expected loss 
function which only varies over the design space, "Dj, for the Ah element. 

(b) For j = 1, ... ,Q, evaluate 

for {di,..., dg} G T>i. Fit a GP emulator to {zj, and set Z*(d) to be the resulting 

predictive mean. 

(c) Find 

d* = argmind6D,L*(d), 

and let d* = (df,..., df_^, d*, ., d^) be the proposed design. 

(d) For i = j,..., B, set 

Af = A(V>f,yf,d^), 
a; = A(V>;,y*,d*), 

where and ^ are samples generated from tp,y\dF and 'i/j,y|d*, 

respectively. 
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(e) Calculate 


p* = 1 — F 


Z^i=j Z^j=l 

^/2Bv 


where F{-) is the distribution function of the t-distribution with 2B — 2 degrees of freedom, 

. E'Li(Af-AT + Ef.i(A--A-)^ 

” 2B-2 

and and A* are the sample means of the A^’s and A*’s, respectively. 

(f) Set d*" = d* with probability p*. 

3. Return to step 2, until convergence. 

Convergence can be assessed informally using trace plots of the evaluations of either A* or A*" at step 
2 (e), dependent on whether the proposed design was accepted or not, respectively. 

The ACE algorithm should be started from multiple different starting designs d°. Out of the resulting 
designs, the one with the lowest value of I/B(d) is returned. 
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